376 results found.
Written
Corpus,
Language Type:
Bilingual
Languages:
Guarani Spanish
Availability:
From Owner
License:
Size:
228000 words Production Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Development of a Guarani - Spanish Parallel Corpus
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Luis Chiruzzo | Guarani - Spanish Parallel Corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bulgarian Catalan Croatian Czech Danish Dutch English Estonian Filipino Finnish French German Greek Hebrew Hindi Hungarian Indonesian Italian Japanese Korean Latvian Lithuanian Malay Norwegian Persian Polish Portuguese Romanian Russian Serbian Simplified Chinese Slovak Slovenian Spanish Swedish Thai Traditional Chinese Turkish Ukrainian Vietnamese
Availability:
Freely Available
License:
CC-BY-SA
Size:
60 GByte Production Status:
Newly created-in progress
Use:
Language Modelling
-
Paper title:Wiki-40B: Multilingual Language Model Dataset
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rami Al-Rfou | Wiki40B-LM | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
<Not Specified>
Size:
22.10G tokens Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | OpenSubtitles2018 | /N |
Documentation:
Yes, on the website.
Written
Lexicon,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
CreativeCommons Attribution 4.0 International
Size:
41 GByte Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | word2word | /N |
Documentation:
Yes, on the website.
Written
Corpus,
Language Type:
Monolingual
Languages:
Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
5.7 MByte Production Status:
Newly created-in progress
Use:
Information Extraction, Information Retrieval
-
Paper title:NUBes: A Corpus of Negation and Uncertainty in Spanish Clinical Texts
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Salvador Lima Lopez | NUBes | /N |
Documentation:
Annotation guidelines are also available (in Spanish) in the same site
Multimodal/Multimedia
Corpus,
Language Type:
Monolingual
Languages:
Adyghe Albanian Ancient Greek Arabic Armenian Asturian Basque Belarusian Bulgarian Catalan Church Slavic Classic Syriac Classical Armenian Czech Danish Dutch English Estonian Faroese Finnish Georgian German Gothic Hindi Hungarian Icelandic Ingrian Irish Kabardian Kalaallisut Kannada Kazakh Khakas Latin Latvian Lithuanian Livonian languages Low German Lower Sorbian Macedonian Maltese Middle French Middle High German Middle Low German Modern Greek Neapolitan Northern Sami Occitan Old English Old French Old Irish Old Saxon Pashto Persian Polish Portuguese Romanian Slovenian Spanish Swedish Tibetan Turkish Turkmen Ukrainian Urdu Veps Votic Welsh
Availability:
Freely Available
License:
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Size:
557.3 MByte Production Status:
Newly created-in progress
Use:
Morphological Analysis
-
Paper title:Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus
-
Paper track:Multimodality/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Eleni Metheniti | Wikinflection Corpus | /N |
Documentation:
https://github.com/lenakmeth/Wikinflection-Corpus/blob/master/README.md
Written
Corpus,
Language Type:
Multilingual
Languages:
English French Portuguese Spanish
Availability:
Freely Available
License:
Size:
300 OtherProduction Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:MEDLINE as a Parallel Corpus: a Survey to Gain Insight on French-, Spanish- and Portuguese-speaking Authors’ Abstract Writing Practice
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Aurélie Névéol | MEDLINE parallel corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Arabic Azerbaijani Belarusian Bulgarian Catalan Danish English Estonian Filipino Finnish Hindi Hungarian Indonesian Irish Italian Japanese Kazakh Korean Latvian Lithuanian Mongolian Norwegian Polish Portuguese Russian Serbian (Latin) Slovenian Spanish Swedish Tamil Turkish Ukrainian Urdu Uzbek Vietnamese ces deu ell fas fra isl kat mkd nld ron slk sqi zho
Availability:
Freely Available
License:
GNU-GPL v.3
Size:
45 billion words Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:Geographically-Balanced Gigaword Corpora for 50 Language Varieties
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jonathan Dunn | GeoWAC | /N |
Documentation:
https://github.com/jonathandunn/earthlings
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
Spanish
Availability:
Freely Available
License:
CreativeCommons (Attribution-ShareAlike 4.0 International)
Size:
2.0 GByte Production Status:
Newly created-finished
Use:
Speech Synthesis
-
Paper title:Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech
-
Paper track:Speech/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alexander Gutkin | Crowd-sourced high-quality Argentinian Spanish speech data set by Google | /N |
Documentation:
README file in English.
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
Spanish
Availability:
Freely Available
License:
CreativeCommons (Attribution-ShareAlike 4.0 International)
Size:
1.5 GByte Production Status:
Newly created-finished
Use:
Speech Synthesis
-
Paper title:Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech
-
Paper track:Speech/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alexander Gutkin | Crowd-sourced high-quality Chilean Spanish speech data set by Google | /N |
Documentation:
README file in English.




